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We consider entropy in Generalized Non-Signalling Theory (also known as box world) 
where the most common definition of entropy is the measurement entropy. In this setting, 
we completely characterize the set of allowed entropies for a bipartite state. We find that 
the only inequalities amongst these entropies are subadditivity and non-negativity. What is 
surprising is that non-locality does not play a role - in fact any bipartite entropy vector can 
be achieved by separable states of the theory. This is in stark contrast to the case of the von 
Neumann entropy in quantum theory, where only entangled states satisfy S{AB) < S{A). 



I. INTRODUCTION 

Entropy is a crucial concept in both classical and quantum information theory. The Shannon 
entropy was originally introduced as a measure of the uncertainty of a random variable [Ij , which 
turned out to have many applications in information theory, including optimal compression rates 
and channel capacities. Remarkably, the von Neumann entropy was introduced 20 years before 
the Shannon entropy, and was motivated by thermodynamic considerations [2J. It has found 
innumerable applications in quantum information theory, including its role as a measure of pure 
state entanglement [3ll3], and as the analogue of the Shannon entropy in many quantum coding 
theorems [5]. 

Given a multi-party quantum state p one can compute the von Neumann entropy of its various 
reduced states e.g. S{A) := S{pa), S{AB) := S{pab) etc., and so form the entropy vector of 
this state p. So for example, for two-party states, the entropy vector is (S{A),S{B),S{AB)). 
For N parties, the entropy vector lives in the vector space of 2^ — 1 real dimensions. The 
question of which vectors can arise has been the subject of increasing interest recently, both in 
the quantum (von Neumann entropy) jGHS] and classical (Shannon entropy) [9Hllj cases. 

For example for two parties, both quantum entropies Sq and classical entropies Sc are non- 
negative and satisfy subadditivity 

Sq{AB) < Sq{A) + Sq{B), 

Sc{AB) < Sc{A) + Sc{B), (1) 

However the space of achievable entropy vectors is different for the classical and quantum cases. 
Whereas classical entropies Sc satisfy monotonicity 

SciAB) > SciA), (2) 

quantum entropies are more general and only satisfy the weaker Araki-Lieb inequality [12] 

Sq{AB)>Sq{A)-Sq{B). (3) 

In particular, a vector such as (1, 1, 0) is achievable as a quantum entropy vector; it is the entropy 
vector of a singlet. However this vector is not achievable for any classical distribution; it does 
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not satisfy ([2]). Thus the space of entropy vectors seems to capture some of the differences 
between classical and quantum states. Indeed study of the space of achievable entropy vectors 
is a powerful tool in investigating multi-party entanglement of quantum states. 

Mathematically, the space of entropy vectors is a cone |6] , and characterizing this cone is an 
important problem in classical and quantum information theory [13]. The problem is completely 
solved for three or fewer parties in the classical and quantum cases [H HI] (leading to different 
cones, of course); the cases of four or more parties (classical or quantum) remain open. One may 
understand a cone either by giving the inequalities or, dually, by the extremal rays. And, perhaps 
not surprisingly, points on these extremal rays typically correspond to interesting states. For 
example for quantum entropy vectors of two parties the extremal rays include A(1,1,0), A>0; 
a point on this ray may be achieved by the singlet, as mentioned above. For three parties one 
of the extremal quantum rays may be achieved by the GHZ state (see also [7] I15j). 

With these observations in mind we turn now to so called "generalized probabilistic theories" 
(GPTs) [16]. These are theories which generalize classical and quantum theories, beginning from 
an operational viewpoint, where states are characterized by the output distributions of certain 
measurements. One aim of this field of research is to compare these more general theories with 
quantum theory, and in doing so gain some intuition as to 'why' Nature seems to prefer quantum 
theory. 

Attempts have been made to introduce an entropy function within these general theories. The 
most popular seems to be the measurement entropy which satisfies many desirable properties 
for an entropy function |17j : it reduces to the Shannon and von Neumann entropies in classical 
and quantum theories respectively; it is always non-negative; and it is concave. In certain (quite 
broad) classes of theories, it is also subadditive and continuous. 

Here we investigate features of the measurement entropy in 'generalized non-signaling theory' 
(GNST) [16] (also known as box world [18]) - the most famous and well studied GPT, which 
allows all non-local correlations that are non-signalling. Our first goal is to characterize the set 
of entropy vectors. It has already been noted that this entropy violates strong subadditivity and 
so the allowed entropy vectors are in some sense more general than the corresponding classical 
and quantum ones [Ej. Our initial thought was that the space of achievable entropy vectors 
in GNST would reflect and shed light on the way this theory generalizes classical and quantum 
states. 

We are able to completely determine the set of bipartite GNST entropy vectors (up to the 
closure) . We find this set to be the cone in cut out by the non- negativity and subadditivity of 
the entropy and no other inequalities. This is in contrast with classical probability and quantum 
theory, where the analogous set is smaller due to the monotonicity (classical) and Araki-Lieb 
(quantum) inequalities. 

What is very surprising, however, is that every entropy vector in GNST can be achieved by a 
separable state. This means that the measurement entropy is unable to detect non-locality. This 
is not true in quantum theory, where all separable states (but certainly not all states) satisfy 
the monotonicity relation S{A) < S{AB) [1^; thus one may say that those quantum entropy 
vectors that do not satisfy monotonicity are the "truly" quantum ones. 

The structure of the paper is as follows: in section 2 we briefly review GPTs; in section 3 we 
consider in some detail the allowed measurements in GNST; in section 4 we characterize the set 
of bipartite GNST entropy vectors; and in section 5 we consider the implications of this result. 
We close the paper with some concluding remarks. 
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II. GENERALIZED PROBABILISTIC THEORIES 



It has been well known since Bell's theorem [20j that quantum theory admits correlations 
which are incompatible with any local classical theory. However, there are also correlations 
compatible with the no-signalling principle which cannot be produced by quantum theory. The 
most famous example is the PR-box j21j . 



A WV B 



a 



Here, two parties (Alice and Bob) each own part of a bipartite system. Alice and Bob choose 
inputs x,y £ {0, 1} respectively; the values of x and y correspond to different measurements on 
their systems. They then obtain outcomes a,b £ {0, 1} according to the distribution: 



p{a,b\x,y) 



if a ( 
else 



xy 



(4) 



We know that quantum theory cannot produce such a distribution [52]. The motivation behind 
GPTs is to consider what kinds of physical theory could admit these and other general no- 
signalling correlations. 

In order to form a new physical theory, we assume that the state of a system is determined 
by the outcome distributions of measurements on the system. 

For an individual system, we assume that there is a set of k measurements, each with I 
outcomes, which determine the state uniquely. We call these the fiducial measurements. The 
state of the system is then a vector 



0\x 
l\x 



0) 
0) 



p{a = I — 2\x = k — 1) 
\p{a = l-l\x = k-l) ) 



(5) 



in a real vector space V . The values of k and / can vary from system to system (in the same way 
that different quantum systems have Hilbert spaces with different dimensions). For example, 
when k = \ there is only one fiducial measurement, and we say that the system is classical since 
it is simply a classical random variable. 

For a composite system, we make the further assumption that the fiducial measurements 
are those performed by simultaneously performing a fiducial measurement on each individual 
subsystem. This means that if the state is composed of n individual systems, then the state 
can be considered as a vector, p, with components p(ai, . . . , an\xi, . . . , Xn), also denoted p(a|x). 
Notice that p naturally lives in the vector space Vi . . . Vn, where Vi is the vector space 
containing the states of system i. 

We now have many different types of system (each system contains some number, n, of 
individual systems, each of which has its own values for k and /). We obtain a physical theory 
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by specifying the sets of allowed states on each type of system. These sets must be convex to 
allow state mixing, and must satisfy the normalization condition: 

^p(a|x) = 1 Vx (6) 

a 

Further, all states must satisfy the no-signalling constraints: 

^p{ai, ...,ai,.. .,an\xi, . . . ,Xi, . . . ,Xn) = ^p(ai, ... ,ai,.. .,an\xi, . . . , x-, . . . , x^) (7) 

These constraints are important for two reasons: firstly, any state which violates these con- 
straints would allow superluminal signalling. Secondly, they allow us to define the reduced state 
of a multipartite system: 

p{a\k) := ^p(a|x) (8) 

ai 

where e.g. a = (ai, . . . , Oj-i, Oj+i, . . . , an), and we know that the sum does not depend on the 
value of Xj. 

These theories can then be extended to include general measurements (this will be discussed 
in more detail in section III ) and transformations, where we make further, physically motivated 
assumptions. For a full discussion see [IBj. 



III. GENERALIZED NON-SIGNALLING THEORY 

In this paper we consider a particular GPT known as 'generalized non-signalling theory' 
(GNST). GNST is the most general GPT in the sense that, for any type of system, the set of 
allowed states is all those which satisfy no-signalling. It is also known as 'box world' since we 
refer to individual systems in GNST as boxes. In this section we are especially interested in the 
measurements which the theory permits. 



Measurements in GNST 

Suppose we have a system with a set of allowed states y. In a generalized probabilistic 
theory, an arbitrary measurement on the system (including, but not limited to, the fiducial 
measurements) has the following form: it is a set of pairs (r, /i,.), where r is the outcome of the 
measurement, and is the corresponding effect. An effect is a linear map /i : =5^ — >• [0, 1] (so 
that /ir(p) is the probability that outcome r is obtained when the measurement is performed 
on state p). To ensure that these probabilities always sum to 1, every measurement must have 
that Ylr l^r = u, where u is the constant map u(p) = 1 Vp G S^. 

In GNST, any linear function : =y — )• [0, 1] is an allowed effect, and any set of effects {fj-r} 
which sum to the unit map is an allowed measurement. Here we review what is known about 
the set of measurements in GNST, and prove a slight generalization of a result in [18j which we 
will use in Section IIVI 

Since effects are linear functionals, they must be of the form: 

f^r{p) =^p{a\x)Rr{a\x) (9) 

a,x 

for some vector R,. (with entries indexed over a and x). We say that Rr represents /i^- However, 
note that there will be many vectors which represent each /ir- 
The following lemma is crucial. 
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Lemma 1 (Barrett [16^ Appendix D]). Every effect /x can be represented by some vector R such 
that < i?(a|x) < 1 for all a, x. 

Now suppose we have a composite system of many boxes. One way in which we can perform 
a measurement is to do the foUowing: 

• Choose one of the individual boxes and perform a fiducial measurement on that box. 

• Based on the outcome of this measurement, choose another box and perform a fiducial 
measurement on that box. 

• Repeat until all the boxes have been measured. 

We call a measurement which has this form a basic measurement. The outcomes of a basic 
measurement are the outputs of the boxes, so that, in the language used above, the index r runs 
over all output vectors a. The probability of obtaining output a is equal to p(a|x(a)), where 
x(a) is the list of inputs which are entered in the measurement when outputs a are obtained. 
Thus, a vector representing the effect fi^ is Ra with components, 

i?a(a|x) = {j 2 = "^"'^ = "^"^ (10) 

In |18j it is shown that the only measurements which can be performed on systems of one or 
two boxes are basic measurements, or probabilistic mixtures of basic measurements. In section 



IV we will need the following generalization: 



Lemma 2. Let Ai, A2, Bi, . . . , Bn be a system of boxes, where only A\ and A2 are not classical. 
Then all measurements on this system are basic, or mixtures of basic measurements. 

The proof of this lemma is straightforward, and provided in the appendix. 



Maximally informative measurements 

A notion which will be important in the next section is that of a maximally informative 
or fine-grained measurement. Let M = {(r, /u^)} and N = {(s,z^s)} be two measurements, and 
denote their sets of possible outcomes by Om and On respectively. We say that is a refinement 
of M if On can be partitioned into sets Pr such that, for each r, fir = J2sePr ^^^^ case, 

A^ can be used to perform M (by performing A^ and returning r such that the outcome s is 
in the set Pr). The refinement is trivial if Vs oc /i^ whenever s £ Pr. If M has no non-trivial 
refinement, then no other measurement reveals strictly more information about the state, and 
hence we call M maximally informative, or fine-grained. 

The following lemma gives an important characterization of maximalUy informative measure- 
ments in GNST. Although the result may seem obvious in view of lemma [T| the proof requires 
a little effort and can be found in the appendix. 

Lemma 3. A GNST measurement M = {(r, /i,.)} is maximally informative if and only if every 
effect fir can be represented by a vector with only one non-zero entry, which is between and 1. 

Remark. Lemma [3] ensures that basic measurements are maximally informative. 
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Suppose we have a composite system of subsystems X and Y (which may themselves be 
composite systems) and measurements Mx and My on each system. One way to perform a 
measurement on the composite system would be to perform Mx on X and My on Y indepen- 
dently. This is always an allowed measurement and we denote it Mx CS) My. Precisely, if Mx 
has effects fir represented by vectors Rr and My has effects Vs represented by vectors Rg, then 
Mx "8) My has effects r^^^ which can be represented by vectors R^ tX" R^- (The latter (8) here is 
a genuine tensor product of vectors, as mentioned in section [n)) . 

Corollary 4. If Mx is a maximally informative measurement on a box X and My is maximally 
informative on Y , then Mx ® My is a maximally informative measurement on the composite 
system XY . 

Proof. If Rx and Ry are vectors with one non-zero entry then so is Rx (X" Ry . Lemma [3] then 
implies the result. □ 



IV. ENTROPY IN GNST 

We are now in a position to introduce the entropy function which we will study in this paper. 
The measure of entropy we use is the measurement entropy, H, which is defined as follows. 
Suppose we have a state p on a system X. Then, 

i?(X)p := inf i/M(^)p (11) 

where Hm{X)-p is the Shannon entropy of the outcomes of measurement M on system X with 
state p, and the infimum is taken over the set of all maximally informative measurements. 
When it is clear which state we are referring to we will omit the subscript p from the notation. 
The motivation for such a definition of entropy comes from the fact that in classical and quantum 
theories it is none other than the Shannon and von Neumann entropies respectively. It has 
previously been studied in I23j. Lemma [s] implies that in GNST the infimum in (11) can be 
replaced by a minimum. 

It is clear from the definition that the measurement entropy is always non-negative, since the 
Shannon entropy is non-negative. Before we can proceed with the main argument of the paper, 
we require two simple lemmas. 

Lemma 5. Measurement entropy in GNST is subadditive - for any state of a joint system XY 
we have that H{XY) < H{X) + H{Y). 

Proof. Suppose that Mx is the measurement on system X wh ich achieves H{X), and My is the 



measurement on system Y achieving H{Y). From section III we know that M := Mx ^ My is 
a maximally informative measurement on system XY. Thus, 

H{XY) < Hm{XY) (12) 
<HmAX) + HmAY) (13) 
= H{X) + H{Y) (14) 

where the second inequality follows from the subadditivity of the Shannon entropy. □ 

Remark. Note that this proof applies to any GPT in which the analogue of Corollary [4] holds. 
This argument was presented in |17j . 



Lemma 6. If we restrict M in (11) to include only basic measurements, then H is additive on 
product states. 
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Proof. Let p(a, b|x, y) = p{a\x.)p{h\y) be a product state of a joint system XY (where X is 
composed of n boxes, and Y is composed of m boxes). We proceed by induction on n + m. 

Case n + m = l. Wlog n = l,m = 0. Then H{XY) = H{X) = H{X) + H{Y). 

Case n + m > 2. Let M be a measurement which achieves H{XY). Since M is a basic 
measurement it must begin by performing a fiducial measurement on one of the individual 
boxes. Wlog assume that M begins by performing measurement xi on box Xi. Denote the 
Shannon entropy of the outcome of this measurement by Hxj^{Xi). Now suppose that we have 
performed the measurement, and the result ai is known. Let qai,xi be the remaining distribution 
on X2 ■ ■ ■ XnY. Then from the rules of conditional probability: 

/ ^ P(a,b|x,y) 

qai,xi{a2, ■■■ ,an, b X2, .. .,Xn,y) = 7 . ^ (15) 

p{ai\xi) 

and notice that this is still a product state. Denote the remainder of the measurement, which 
has not yet been performed, by M^^. This is a basic measurement on X2 ■ ■ ■ XnY. Then, 

H{XY)p = Hm{XY)p (16) 
= Hx,{X,) + ^p(ai|xi)/7Af„,(X2 . . .X„y)q,^,,^ (17) 

ai 

where the first line follows from the definition of M, and the second line from the grouping 



axiom of the Shannon entropy^ [23]. From (17) we see that whenever ai can actually occur (i.e. 



p{ai\xi) > 0), Ma^ must be a measurement which achieves H{X2 ■ . .X„y)q^^^^, else it would 
be possible to achieve a lower value for H{XY)p. If a\ cannot occur, we may just as well choose 
to be such a measurement. Therefore we have, 

H{XY)^ = H,,{X,) + Y,P{^iW)H{X2 . . . Xnl^)q„,., (18) 

ai 

= if,,(Xi) + Y,Pi^l\^l)H{X2 . . ■ ^n)q„„., + H{Y)p (19) 



ai 



where the second line uses the induction hypothesis, and the fact that the reduced state of p 
on system Y is the same as that of qai.xi- By the grouping axiom, we see that the sum of 



the first two terms on the right hand side of ( 19 ) is the Shannon entropy of the outcomes of a 



measurement on system X. Therefore, by the definition of H it follows that 

HiXY)p > H{X)p + H{Y)p (20) 

Since the proof of lemma [5] works also under the restriction to basic measurements, we arrive at 
the result. □ 

Our aim is to investigate the set of GNST entropy vectors. Let us focus on the two party 
case. We know that a two party entropy vector is a vector in M^, (x, y, z), such that x,y,z > 
and z < X + y. Are there any further constraints on the values of x, y, z? We will show that in 
fact these conditions are all. 

To this end, let be the set of points given by our necessary conditions: 

^ := {{x,y,z) €R^\x,y,z > 0,z<x + y}. (21) 



^ Suppose that we partition the outcomes of a measurement into groups labelled ai, . . . ,ak and break up the 
measurement into two stages. First observe variable A - which group the outcome is in - and second observe 
variable B - the outcome from among that group. The grouping axiom states that the entropy of the overall 
measurement is equal to H{A) + ^iPiai)H{B\A = ai). 
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Then ^ is a closed, convex cone (i.e. i" G ^ implies \v ^ for all A G M>o, and whenever 
fi,W2 £ ^) vi + V2 G ^ also). This means that we can characterize ^ either by the linear 
inequalities which bound or equivalently by its extremal rays. These extremal rays are the 
vectors: 

ei:= (1,0,1) (22) 
62:= (0,1,1) (23) 
63:= (1,0,0) (24) 
64:= (0,1,0). (25) 

Any f G ^ can be written as t; = AiCi + A2e2 + Ases + A4e4, with Aj > for all i. 
Consider the following joint probability distribution: 

where a, b both take values in {0, 1}. Then we can consider p(a, b) to be the state of a two box 
system, in which each box has only one possible input. The entropies are then just the Shannon 
entropies of the different reduced states. Hence the entropy vector is (1,0,1) = ei. We can 
similarly find a probability distribution achieving 62, and it is not hard to generalize these to 
distributions achieving Aei and Ae2 for any A > 0. (Indeed, consider the distribution of two 
random variables - one of which is deterministic, and the other with Shannon entropy A). 

Now consider a system of two boxes, X, Y , where X has two possible inputs (0 and 1) and 
+ 1 outputs (0, 1, . . . , A^), and y is a random variable (i.e. only one input) with two possible 
outputs (0 and 1). The distribution p{a^h\x) is as follows: 

( 1 ifa = 6 = 
p{a, 5|x = 0) = <^ 2^ if a > 1, 6 = 1 (27) 
\ else 



p(a, h\x = 1) 




(28) 



Since we have only two boxes, we know that the only allowed measurements are the basic mea- 
surements (or probabilistic mixtures of basic measurements, but a mixed measurement would 



not be optimal for achieving the minimum in ( 11 )). This makes it easy to calculate the entropies. 
The distribution of X alone, for either input, is (^, 27^, • • • , ^) and hence H{X) = 1 + 5 log A^. 
The reduced distribution of Y is (g, \) and so H{Y) = 1. Now consider the following measure- 
ment. First observe Y to obtain output h. If 5 = set x = 0, otherwise set x = 1. Now observe 
X. With certainty we will find that a = 0, and the distribution of the measurement outcomes is 
(^, |). This implies that H{XY) < 1, and in fact this measurement is optimal, i.e. H{XY) = 1. 

We have discovered that for every N, {1 + ^ log A", 1,1) is an entropy vector. 

We now alter this scenario by adding an extra character, 00, to the output alphabet of both 
boxes. Now for each A^ G N consider the following distribution: 



p{a, b\x = 0) 



2 

A]v 
27V 

1-A^ 




if a = 6 = 
if a > 1,6 = 1 
if a = 6 = 00 
else 



(29) 
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p{a, b\x = 1) 



Ajv 
2 

Ajv 
2N 

1-Xn 




if a = 0,6 = 1 
if a > 1,6 = 
if a = 6 = oo 
else 



(30) 



where A^v are (as yet unspecified) constants (between and 1). Fix a positive real value, k, 
and set Xn = \^]^- By the same reasoning as above, this distribution has entropy vector 

(A^ + ^ log iV + /i(Aiv) , Atv + /i(Ajv) , Ajv + HXn)) = {XN + k + /i(Aiv) , Xn + HXn), Ajv + HXn)), 
here h{q) := —qlogq — {1 — q) log(l — q). As N ^ oo, Xn and h{XN) — )• and so we have 
found entropy vectors arbitrarily close to (A;, 0,0). 

Theorem 7. Every vector in ^ is in the closure of the set of entropy vectors. 

Proof. Consider an arbitrary vector m'if,v = Aici + A2e2 + X^e^ + X^Ci. We have found states 
Pi, P2, P3, Pa whose entropy vectors are (arbitrarily close to) XiCi, X2e2, X^e^, X^e^ respectively, 
and pi,P2 are entirely classical, whereas p^,, Pa have only one non-classical box. If we take 
o" = Pi®P2®Pz®PA) then a has only two non-classical boxes. By lemma[2]the only measurements 
on a are basic measurements, hence lemma [6] tells us that the measurement entropy is additive 
on product states here. Consequently, the entropy vector of a is (arbitrarily close to) f. □ 



RELATION TO NON-LOCALITY 



In the previous section we gave the proof of the main technical result of the paper, but we 
have not yet delivered the punch line. The alert reader would have noticed that all the states 
used in the proof of theorem [T] are separable GNST states (and hence local in that they admit 
a local hidden variable description). For example, the state given by (27) and (28) can be 
decomposed in the following way: 



p{a,b\x) = ]^qi{a\x)ri{b) + \q2{a\x)r2{b) 



(31) 



where ri(0) = l,ri(l) = 0,r2(0) = 0,r2(l) = 1, and 



qi{a\x) 



1 if a = X = 
jj ifa>l,x = l 
else 



(32) 



q2{a\x) 



1 if a = 0, x = 1 

^ if a > l,x = (33) 
else 

Consequently, the theorem could just as easily have read: 

Theorem 7'. Every bipartite GNST entropy vector is in the closure of the set of entropy vectors 
of separable GNST states. 

Suppose that we are given a bipartite GNST state and told its entropy vector, which is known 
to be accurate to within some e, which can be arbitrarily small. Then the theorem tells us that 
we gain no knowledge of the non-local properties of the state. Whatever the entropy vector, the 
state may or may not be separable. This is in stark contrast with the von Neumann entropy, 
for which any vector with S{AB) < S{A) instantly reveals that the state is entangled. 
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A further word of clarification. We showed that the measurement entropy in GNST satisfies 
only subadditivity and non-negativity and (at least for 2 parties) no further inequalities. (In fact 
we have evidence leading us to conjecture that this could also be true for 3 parties). This shows 
that entropy vectors in GNST are more general than entropy vectors in quantum theory. It 
would have been tempting to conclude that the reason for this is the extra non-locality available 
in GNST, but we have seen that this is not the case. What, then, is the cause? 



Consider the following implementation of the state (27)-(28) using classical random variables. 




In this system there are three classical random variables Xq,Xi and Y. Xq and Xi are 
concealed within box X and are arranged such that: 

• If Alice inputs x = 0, then a = ao and Xi is destroyed. 

• If Alice inputs x = 1, then a = ai and Xq is destroyed. 
The distribution of Xq,Xi,Y is as follows: 



/ r\ 1 if ao = 0, ai > 1, 6 = or ao > 1, ai = 0, 6 = 1 

p{ao,a,,b) = { ^^^^ 



(34) 



which gives the same distribution as (27)-(28) for p{a,b\x). 



This raises an obvious question: if this GNST state can be realized via classical probability 



theory, why can't its entropies also be obtained there? The Shannon entropy vector of (34), 



considered as a bipartite state, is (1 -|- log A^, 1,1 -|- logA^) compared with the GNST entropy 
vector (1 + 1 log A^, 1, 1). But the Shannon entropy is none other than the measurement entropy 
in the classical setting. Since the same measurement that achieved H{XY) = 1 in GNST can 
also be performed classically, surely also H{XY) < 1? 

The reason this is not true is that, although this measurement can be performed in the clas- 
sical setting, it is not maximally informative there; since classically a maximally informative 
measurement must give the outputs of both Xq and Xi . This mechanism by which GNST arti- 
ficially makes measurements on classical random variables which are not maximally informative 
to be so, by hiding some of the variables within the boxes, is the reason that GNST entropy 
vectors are more general than classical ones. 



VI. DISCUSSION 

When faced with the task of naming his entropy. Shannon was apparently told by von Neu- 
mann to call it 'entropy' because "nobody knows what entropy really is, so in a debate you 
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will always have the advantage" . Sixty years on this is still true. If 'entropy' means 'the mini- 
mum amount of uncertainty of a system under a maximally informative measurement' (and we 
accept the Shannon entropy as synonymous with 'uncertainty') then we are forced to accept 
the measurement entropy as the unique entropy in any physical theory. However, this is not 
the prevailing definition. If we take a more pragmatic approach, and allow any function to be 
deemed an 'entropy' if it satisfies a certain set of properties, then the question becomes: which 
properties do we choose? 

This is where the result of the previous section fits. If you consider links between the von 
Neumann entropy and non-locality to be a happy coincidence, then this result has no bearing 
on the measurement entropy. If, however, you consider that in a highly non-local theory, such 
as GNST, an 'entropy' ought to reflect this non-locality, then the measurement entropy cannot 
really be an 'entropy'. 

It would be interesting to explore the existence of a 'better' entropy than measurement 
entropy in GNST. We know from [25] that there is no function which obeys all the same desirable 
properties that the von Neumann entropy does in the quantum regime. However, could there 
be a function with the same desirable properties as the measurement entropy that also detects 
non-locality? Another interesting problem would be to determine whether or not the analogue 
of Theorem 7' holds in other GPTs which are between quantum theory and GNST. 

Ultimately, the Shannon and von Neumann entropies are useful functions, not because of 
the desirable properties they have, but because of their impact on physics and information 
theory. Their importance lies in the fact they can be used to give expressions for optimal rates 
of compression, or for channel capacities. To the best of our knowledge, only one such theorem 
is known using the measurement entropy [T^. Other such theorems would be the best way to 
prove the usefulness of 'entropy' measures. 
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Appendix A: Appendix 



In this appendix we prove two of the lemmas stated in section III 



Lemma 2. Let Ai, A2, Bi, . . . , Bn be a system of boxes, where only Ai and A2 are not classical. 
Then all measurements on this system are basic, or mixtures of basic measurements. 

Proof. Let a = (ai, 02) denote the outputs of the 2 non-classical boxes, and x = {xi,X2) denote 
their inputs. Let b = (61, . . . ,bn) denote the outputs of the classical boxes (since these boxes 
have no choice of input, we omit their inputs from the notation). Let M = {(r, /z^)} be an 
arbitrary measurement on the system, with {Rr} a set of representing vectors given by Lemma 

m 

For each b let n!^^ be the vector with components i?r'^^(a|x) = i?,.(a, b|x). Now, for fixed 

b, we claim that {Rr''^} represent a measurement, M^^^ on the non-classical part of the system. 
To see this, note that whenever p(a|x) is a state on Ai,A2, we have: 

^p(alx)4^)(alx)= J] p(a|x),5j,^i?,(a,b|x) = l (Al) 

r,a,x r,a,b,x 

where the last equality follows from the fact that p{a\x)6y^f^ is an allowed state of the overall 
system. 

Since M^^^ is a measurement on a two box system, it must be a mixture of basic measurements 
|18j . The following is, therefore, also a mixture of basic measurements on Ai, A2, Bi, . . . , Bn. 



(i) Obtain outputs b. 

(ii) Perform measurement M^^\ 



13 



But, in fact, this measurement is M, since for this measurement: 

Prob(r) = ^p(b)Prob(r|b) (A2) 
b 

= Ef(b)E^^''^(^l'^)?'(^|b''^) (A3) 

b a,x 

= E^'(b)E4^Ha|x)^^ (A4) 
= ^ i?r(a, b|x)p(a, b|x) (A5) 

a,b,x 

which is the same as the probabihty of getting outcome r in measurement M. Here, p(a|b, x) is 
the probability of getting output a from systems Ai , A2 when we input x given knowledge of the 
outputs b from the classical boxes. This is equal to p(b)^^p(a, b|x) by the rules of conditional 
probability and the no-signalling condition. □ 

Lemma 3. A measurement M = {[r, /ir)} is maximally informative if and only if every effect 
can be represented by a vector with only one non-zero entry, which is between and 1. 

In order to make the proof of this lemma more clear, we first introduce two simple lemmas. 

Lemma 8. Let (ai,xi) and (a2,X2) he output-input pairs of a GNST system, with not hath 
ai = a2 and xi = X2. Then there exists an allowed state, p, such that p(ai|xi) = and 
p(a2|x2) > 0. 

Proof. First suppose that ai 7^ a2. Then we can choose p to be the distribution: 

p(a|x) = (5aa2 (A6) 

where (5xy is 1 if x = y and otherwise. 

Now suppose that ai = a2. This means that we must have xi 7^ X2. Suppose that xi and 
X2 disagree in the ith entry. Let ai be a vector of outputs which disagrees with ai only in the 
ith entry. We can then choose p to be the distribution: 

( \ \ - \ '^aai if X, xi agree in zth entry , 
^^^'""^ " I 5aa, else ^^^^ 

□ 



Lemma 9. Suppose that R and S are vectors representing the effect ji, such that R has exactly 
one non-zero entry, and S has no negative entries. Then R = S. 

Proof. Let d = R — S. Since R, S both represent the same effect, it must be the case that 
d • p = for all states p. We aim to show that d = 0. 

Let (ai,xi) be such that i?(ai|xi) > 0. Let (a2,X2) be a distinct, but otherwise arbitrary, 
output-input pair. Note that (i(a2|x2) < 0. Now choose p according to the previous lemma, 
and notice that for this choice of p, d • p will be negative, unless (i(a2|x2) = 0. But a2,X2 were 
arbitrary, so in fact d(ai|xi) is the only possibly non-zero component of d. Finally, let p be the 
distribution j'(a|x) = (5aai and then d • p = implies that, in fact, d = 0. □ 
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Proof of lemma 3. Suppose that fi is an effect which can be represented by a vector, R, with 
only one non-zero entry. Suppose also that = Yli'^i some effects fj. For any vectors Sj 
which represent Ui, the vector Sj represents fi. By lemma [l] we can choose the Sj so that 
they have no negative entries. Then, by lemma[9| this means that R = Y2i ^i- This implies that 
Si oc R, and hence i/j oc /x for all i. Thus, if all the effects of a measurement can be represented 
in this way, then the measurement must be maximally informative. 

Conversely, suppose /i cannot be represented by such a vector. Let R be a vector with entries 
between and 1 which represents R must have more than one non-zero entry. Let Ri be the 
vector which shares R's first non-zero entry, and has zeroes elsewhere, and let R2 = R — Ri. 
Then Ri and R2 both represent valid effects z^i, 1^2 with ui + 1^2 = Now, if z^i oc /x then for 
some constant A we have Az^i = and hence ARi represents /x. But by lemma [9] this implies 
that ARi = R, which is clearly false. Consequently, there must exist a non-trivial refinement of 
any measurement containing /x. □ 

Remark. In the proof of lemmajs] (and hence also in lemmas [s] and [9]) we assumed for simplicity 
that each box has more than one possible output. In the (rather trivial) case that some boxes 
have only one output, lemmas [8] and [9] do not hold. However, it is still possible to obtain lemma 
[3] by similar reasoning. 

The key observation is the following. Suppose that we have a system of boxes, some of which 
have fixed outputs. We denote these boxes by X, their inputs x and their outputs 0. The 
remainder of the system, Y, has boxes with inputs y and outputs b. We now show that we 
can reduce the theory to one on system Y only. The no signalling constraints ensure that for 
any state p of the system, for all x, x', p{0, b|x, y) = p{0, b|x', y). This implies that a vector R 
represents an effect /x if and only if the vector R' also represents /x, where 




(A8) 



Thus the effect is essentially an effect on system Y: R"{h\y) := i?'(0, b|0, y). We can now run 
the proofs of lemmas 3|8 and [9] for this effect. 



